1 Loads & Install Packages

if (!require("nnet")) install.packages("nnet")
## Caricamento del pacchetto richiesto: nnet
## Warning: il pacchetto 'nnet' è stato creato con R versione 4.2.3
if (!require("summarytools")) install.packages("summarytools")
## Caricamento del pacchetto richiesto: summarytools
## Warning: il pacchetto 'summarytools' è stato creato con R versione 4.2.3
if (!require("dplyr")) install.packages("dplyr")
## Caricamento del pacchetto richiesto: dplyr
## Warning: il pacchetto 'dplyr' è stato creato con R versione 4.2.3
## 
## Caricamento pacchetto: 'dplyr'
## I seguenti oggetti sono mascherati da 'package:stats':
## 
##     filter, lag
## I seguenti oggetti sono mascherati da 'package:base':
## 
##     intersect, setdiff, setequal, union
if (!require("ggplot2")) install.packages("ggplot2")
## Caricamento del pacchetto richiesto: ggplot2
## Warning: il pacchetto 'ggplot2' è stato creato con R versione 4.2.3
if (!require("tidyverse")) install.packages("tidyverse")
## Caricamento del pacchetto richiesto: tidyverse
## Warning: il pacchetto 'tidyverse' è stato creato con R versione 4.2.3
## Warning: il pacchetto 'tibble' è stato creato con R versione 4.2.3
## Warning: il pacchetto 'tidyr' è stato creato con R versione 4.2.3
## Warning: il pacchetto 'readr' è stato creato con R versione 4.2.3
## Warning: il pacchetto 'purrr' è stato creato con R versione 4.2.3
## Warning: il pacchetto 'stringr' è stato creato con R versione 4.2.3
## Warning: il pacchetto 'forcats' è stato creato con R versione 4.2.3
## Warning: il pacchetto 'lubridate' è stato creato con R versione 4.2.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0
## ✔ readr     2.1.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ tibble::view()  masks summarytools::view()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
if (!require("lubridate")) install.packages("lubridate")
if (!require("mapview")) install.packages("mapview")
## Caricamento del pacchetto richiesto: mapview
## Warning: il pacchetto 'mapview' è stato creato con R versione 4.2.3
if (!require("sf")) install.packages("sf")
## Caricamento del pacchetto richiesto: sf
## Warning: il pacchetto 'sf' è stato creato con R versione 4.2.3
## Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
if (!require("geojsonio")) install.packages("geojsonio")
## Caricamento del pacchetto richiesto: geojsonio
## Warning: il pacchetto 'geojsonio' è stato creato con R versione 4.2.3
## Registered S3 method overwritten by 'geojsonsf':
##   method        from   
##   print.geojson geojson
## 
## Caricamento pacchetto: 'geojsonio'
## 
## Il seguente oggetto è mascherato da 'package:base':
## 
##     pretty
if (!require("leaflet")) install.packages("leaflet")
## Caricamento del pacchetto richiesto: leaflet
## Warning: il pacchetto 'leaflet' è stato creato con R versione 4.2.3
if (!require("broom")) install.packages("broom")
## Caricamento del pacchetto richiesto: broom
## Warning: il pacchetto 'broom' è stato creato con R versione 4.2.3
if (!require("plotly")) install.packages("plotly")
## Caricamento del pacchetto richiesto: plotly
## Warning: il pacchetto 'plotly' è stato creato con R versione 4.2.3
## 
## Caricamento pacchetto: 'plotly'
## 
## Il seguente oggetto è mascherato da 'package:ggplot2':
## 
##     last_plot
## 
## Il seguente oggetto è mascherato da 'package:stats':
## 
##     filter
## 
## Il seguente oggetto è mascherato da 'package:graphics':
## 
##     layout
library(nnet)
library(summarytools)
library(dplyr)
library(ggplot2)
library(tidyverse)
library(lubridate)
library(mapview)
library(sf)
library(geojsonio)
library(leaflet) 
library(broom)
library(plotly)

2 Dataset description

The Fire Incident Dispatch Data file contains data that is generated by the Starfire Computer Aided Dispatch System. The data spans from the time the incident is created in the system to the time the incident is closed in the system. It covers information about the incident as it relates to the assignment of resources and the Fire Department’s response to the emergency. To protect personal identifying information in accordance with the Health Insurance Portability and Accountability Act (HIPAA), specific locations of incidents are not included and have been aggregated to a higher level of detail.

In this analysis we have restricted the analysis only on the last 50.000 observations.

  1. STARFIRE_INCIDENT_ID: An incident identifier comprising the 5 character julian date, 4 character alarm box number, 2 character number of incidents at the box so far for the day, 1 character borough code , 4 character sequence number.
  2. INCIDENT_DATETIME: The date and time of the incident.
  3. ALARM_BOX_BOROUGH: The borough of the alarm box.
  4. ALARM_BOX_LOCATION: The location of the alarm box.
  5. ALARM_BOX: The alarm box number.
  6. INCIDENT_BOROUGH: The borough of the incident.
  7. ZIPCODE: The zip code of the incident.
  8. POLICEPRECINCT: The police precinct of the incident.
  9. CITYCOUNCILDISTRICT: The city council district.
  10. COMMUNITYDISTRICT: The community district.
  11. COMMUNITYSCHOOLDISTRICT: The community school district.
  12. CONGRESSIONALDISTRICT: The congressional district.
  13. ALARM_SOURCE_DESCRIPTION_TX: The description of the alarm source.
  14. ALARM_LEVEL_INDEX_DESCRIPTION: The alarm level index.
  15. HIGHEST_ALARM_LEVEL: The highest alarm level.
  16. INCIDENT_CLASSIFICATION: The incident classification.
  17. INCIDENT_CLASSIFICATION_GROUP: The incident classification roll up group. (response)
  18. DISPATCH_RESPONSE_SECONDS_QY: The elapsed time in seconds between the incident_datetime and the first_assignment_datetime.
  19. FIRST_ASSIGNMENT_DATETIME: The date and time of the first unit assignment.
  20. FIRST_ACTIVATION_DATETIME: The date and time of the first unit acknowledgement of the assignment.
  21. FIRST_ON_SCENE_DATETIME: The date and time of the first unit at the scene of the incident.
  22. INCIDENT_CLOSE_DATETIME: The date and time that the incident was closed in the dispatch system.
  23. VALID_DISPATCH_RSPNS_TIME_INDC: Indicates that the components comprising the generation of the DISPATCH_RESPONSE_SECONDS_QY are valid.
  24. VALID_INCIDENT_RSPNS_TIME_INDC: Indicates that the components comprising the generation of the INCIDENT_RESPONSE_SECONDS_QY are valid.
  25. INCIDENT_RESPONSE_SECONDS_QY: The elapsed time in seconds between the incident_datetime and the first_onscene_datetime.
  26. INCIDENT_TRAVEL_TM_SECONDS_QY: The elapsed time in seconds between the first_assignment_datetime and the first_onscene_datetime.
  27. ENGINES_ASSIGNED_QUANTITY: The number of engine units assigned to the incident.
  28. LADDERS_ASSIGNED_QUANTITY: The number of ladder units assigned to the incident.
  29. OTHER_UNITS_ASSIGNED_QUANTITY: The number of units that are not engines or ladders that were assigned to the incident.

For this analysis we use as response the INCIDENT_CLASSIFICATION_GROUP which contain the main grups of incidents. They are the following one:

  1. Structural Fires
  2. NonStructural Fires
  3. NonMedical Emergencies
  4. NonMedical MFAs
  5. Medical Emergencies
  6. Medical MFAs

3 Data Exlporation and Cleaning

The first step is always to read the dataset and plot the first 5 observations

fire_data <- read.csv("datasets/Fire_Incident_Dispatch_Data_last_50k.csv")

head(fire_data)
##    STARFIRE_INCIDENT_ID      INCIDENT_DATETIME ALARM_BOX_BOROUGH
## 1 230905-B1937-001-0567 09/05/2023 02:19:04 PM          BROOKLYN
## 2 230905-B3923-002-0568 09/05/2023 02:19:36 PM          BROOKLYN
## 3 230905-X8897-003-0480 09/05/2023 02:19:43 PM             BRONX
## 4 230905-X3466-001-0481 09/05/2023 02:21:00 PM             BRONX
## 5 230905-B2448-001-0570 09/05/2023 02:21:26 PM          BROOKLYN
## 6 230905-B2448-002-0571 09/05/2023 02:22:35 PM          BROOKLYN
##   ALARM_BOX_NUMBER                    ALARM_BOX_LOCATION INCIDENT_BOROUGH
## 1             1937                AUTUMN AVE & FULTON ST         BROOKLYN
## 2             3923          N/S EASTERN PWAY & UTICA AVE         BROOKLYN
## 3             8897 CROSS BX EXPY- DEEGAN EX TO JEROME AV            BRONX
## 4             3466                  ADEE AVE & BX PARK E            BRONX
## 5             2448             GLENWOOD RD & BEDFORD AVE         BROOKLYN
## 6             2448             GLENWOOD RD & BEDFORD AVE         BROOKLYN
##   ZIPCODE POLICEPRECINCT CITYCOUNCILDISTRICT COMMUNITYDISTRICT
## 1   11208             75                  37               305
## 2   11213             71                  35               309
## 3      NA             NA                  NA                NA
## 4   10467             49                  12               211
## 5   11210             70                  45               314
## 6   11210             70                  45               314
##   COMMUNITYSCHOOLDISTRICT CONGRESSIONALDISTRICT ALARM_SOURCE_DESCRIPTION_TX
## 1                      19                     7                         EMS
## 2                      17                     9                     CLASS-3
## 3                      NA                    NA                     EMS-911
## 4                      11                    15                         EMS
## 5                      22                     9                         EMS
## 6                      22                     9                         EMS
##   ALARM_LEVEL_INDEX_DESCRIPTION HIGHEST_ALARM_LEVEL
## 1                 Initial Alarm         First Alarm
## 2                 Initial Alarm         First Alarm
## 3                 Initial Alarm         First Alarm
## 4                DEFAULT RECORD         First Alarm
## 5                DEFAULT RECORD         First Alarm
## 6                DEFAULT RECORD         First Alarm
##                  INCIDENT_CLASSIFICATION INCIDENT_CLASSIFICATION_GROUP
## 1 Medical - No PT Contact EMS is Onscene           Medical Emergencies
## 2                          Hospital Fire              Structural Fires
## 3               Vehicle Accident - Other        NonMedical Emergencies
## 4               Medical - EMS Link 10-91           Medical Emergencies
## 5               Medical - EMS Link 10-91           Medical Emergencies
## 6               Medical - EMS Link 10-91           Medical Emergencies
##   DISPATCH_RESPONSE_SECONDS_QY FIRST_ASSIGNMENT_DATETIME
## 1                            7    09/05/2023 02:19:12 PM
## 2                           95    09/05/2023 02:21:11 PM
## 3                           41    09/05/2023 02:20:25 PM
## 4                          298    09/05/2023 02:25:59 PM
## 5                           25    09/05/2023 02:21:52 PM
## 6                          350    09/05/2023 02:28:25 PM
##   FIRST_ACTIVATION_DATETIME FIRST_ON_SCENE_DATETIME INCIDENT_CLOSE_DATETIME
## 1    09/05/2023 02:19:26 PM  09/05/2023 02:25:23 PM  09/05/2023 03:03:15 PM
## 2    09/05/2023 02:21:33 PM  09/05/2023 02:23:21 PM  09/05/2023 02:34:18 PM
## 3    09/05/2023 02:20:35 PM  09/05/2023 02:26:22 PM  09/05/2023 04:13:32 PM
## 4    09/05/2023 02:26:04 PM                          09/05/2023 02:34:23 PM
## 5    09/05/2023 02:22:08 PM                          09/05/2023 02:28:07 PM
## 6                                                    09/05/2023 02:29:09 PM
##   VALID_DISPATCH_RSPNS_TIME_INDC VALID_INCIDENT_RSPNS_TIME_INDC
## 1                              N                              Y
## 2                              N                              Y
## 3                              N                              Y
## 4                              N                              N
## 5                              N                              N
## 6                              N                              N
##   INCIDENT_RESPONSE_SECONDS_QY INCIDENT_TRAVEL_TM_SECONDS_QY
## 1                          378                           371
## 2                          224                           129
## 3                          398                           357
## 4                           NA                            NA
## 5                           NA                            NA
## 6                           NA                            NA
##   ENGINES_ASSIGNED_QUANTITY LADDERS_ASSIGNED_QUANTITY
## 1                         1                         0
## 2                         3                         2
## 3                         2                         3
## 4                         1                         0
## 5                         1                         0
## 6                         1                         0
##   OTHER_UNITS_ASSIGNED_QUANTITY
## 1                             0
## 2                             1
## 3                             1
## 4                             0
## 5                             0
## 6                             0

Use dfSummary from summarytool in order to have a complete and clear sumamry of the dataset.

print(dfSummary(fire_data, 
                varnumbers   = FALSE, 
                valid.col    = FALSE, 
                graph.magnif = 0.75),
                method = 'render')

Data Frame Summary

fire_data

Dimensions: 50000 x 29
Duplicates: 0
Variable Stats / Values Freqs (% of Valid) Graph Missing
STARFIRE_INCIDENT_ID [character]
1. 230905-B0042-001-1051
2. 230905-B0053-001-0760
3. 230905-B0053-002-0910
4. 230905-B0081-001-1137
5. 230905-B0106-002-0632
6. 230905-B0132-001-0713
7. 230905-B0147-001-0967
8. 230905-B0160-001-1125
9. 230905-B0163-001-1026
10. 230905-B0165-001-0778
[ 49990 others ]
1(0.0%)
1(0.0%)
1(0.0%)
1(0.0%)
1(0.0%)
1(0.0%)
1(0.0%)
1(0.0%)
1(0.0%)
1(0.0%)
49990(100.0%)
0 (0.0%)
INCIDENT_DATETIME [character]
1. 09/07/2023 03:53:19 PM
2. 09/11/2023 09:44:33 AM
3. 09/13/2023 12:09:35 AM
4. 09/29/2023 09:44:26 AM
5. 09/05/2023 03:30:51 PM
6. 09/05/2023 03:37:48 PM
7. 09/05/2023 03:53:11 PM
8. 09/05/2023 04:01:29 PM
9. 09/05/2023 04:32:57 PM
10. 09/05/2023 04:59:57 PM
[ 49364 others ]
3(0.0%)
3(0.0%)
3(0.0%)
3(0.0%)
2(0.0%)
2(0.0%)
2(0.0%)
2(0.0%)
2(0.0%)
2(0.0%)
49976(100.0%)
0 (0.0%)
ALARM_BOX_BOROUGH [character]
1. BRONX
2. BROOKLYN
3. MANHATTAN
4. QUEENS
5. RICHMOND / STATEN ISLAND
10973(21.9%)
13980(28.0%)
12890(25.8%)
9879(19.8%)
2278(4.6%)
0 (0.0%)
ALARM_BOX_NUMBER [integer]
Mean (sd) : 2930.3 (2446.5)
min ≤ med ≤ max:
10 ≤ 2275 ≤ 9933
IQR (CV) : 2772 (0.8)
7411 distinct values 0 (0.0%)
ALARM_BOX_LOCATION [character]
1. 8 AVE & W 155 ST
2. 10 RICHMAN PLZ/SEDGWICK A
3. AMSTERDAM AVE & LA SALLE
4. 3 AVE & E 143 ST
5. WASHINGTON AVE & E 170 ST
6. FDR DR & E 6 ST
7. CONCOURSE VILLAGE E & E 1
8. PARK AVE & E 158 ST
9. UNION TPK & WINCHESTER BL
10. 8 AVE & W 33 ST
[ 12203 others ]
85(0.2%)
75(0.1%)
50(0.1%)
48(0.1%)
48(0.1%)
45(0.1%)
44(0.1%)
40(0.1%)
40(0.1%)
39(0.1%)
49486(99.0%)
0 (0.0%)
INCIDENT_BOROUGH [character]
1. BRONX
2. BROOKLYN
3. MANHATTAN
4. QUEENS
5. RICHMOND / STATEN ISLAND
10973(21.9%)
13980(28.0%)
12890(25.8%)
9879(19.8%)
2278(4.6%)
0 (0.0%)
ZIPCODE [integer]
Mean (sd) : 10737.9 (551.8)
min ≤ med ≤ max:
10000 ≤ 10472 ≤ 11697
IQR (CV) : 1098 (0.1)
217 distinct values 3181 (6.4%)
POLICEPRECINCT [integer]
Mean (sd) : 62.3 (34.8)
min ≤ med ≤ max:
1 ≤ 61 ≤ 123
IQR (CV) : 56 (0.6)
77 distinct values 3180 (6.4%)
CITYCOUNCILDISTRICT [integer]
Mean (sd) : 23.1 (15.1)
min ≤ med ≤ max:
1 ≤ 21 ≤ 51
IQR (CV) : 27 (0.7)
51 distinct values 3180 (6.4%)
COMMUNITYDISTRICT [integer]
Mean (sd) : 262.9 (119.4)
min ≤ med ≤ max:
101 ≤ 302 ≤ 595
IQR (CV) : 206 (0.5)
70 distinct values 3180 (6.4%)
COMMUNITYSCHOOLDISTRICT [integer]
Mean (sd) : 14.8 (9.7)
min ≤ med ≤ max:
1 ≤ 13 ≤ 32
IQR (CV) : 18 (0.7)
32 distinct values 3182 (6.4%)
CONGRESSIONALDISTRICT [integer]
Mean (sd) : 10.4 (3.3)
min ≤ med ≤ max:
3 ≤ 11 ≤ 16
IQR (CV) : 5 (0.3)
13 distinct values 3180 (6.4%)
ALARM_SOURCE_DESCRIPTION_TX [character]
1. 911
2. 911TEXT
3. BARS
4. CLASS-3
5. EMS
6. EMS-911
7. ERS
8. ERS-NC
9. PHONE
10. SOL
11. VERBAL
302(0.6%)
14(0.0%)
1(0.0%)
5025(10.1%)
17178(34.4%)
10520(21.0%)
777(1.6%)
1(0.0%)
15146(30.3%)
5(0.0%)
1031(2.1%)
0 (0.0%)
ALARM_LEVEL_INDEX_DESCRIPTION [character]
1. 10-75 Signal (Request for
2. 10-76 & 10-77 Signal (Not
3. 7-5 (All Hands Alarm)
4. DEFAULT RECORD
5. Initial Alarm
6. Second Alarm
7. Third Alarm
13(0.0%)
3(0.0%)
100(0.2%)
17313(34.6%)
32562(65.1%)
8(0.0%)
1(0.0%)
0 (0.0%)
HIGHEST_ALARM_LEVEL [character]
1. All Hands Working
2. First Alarm
3. Second Alarm
4. Third Alarm
100(0.2%)
49891(99.8%)
8(0.0%)
1(0.0%)
0 (0.0%)
INCIDENT_CLASSIFICATION [character]
1. Medical - EMS Link 10-91
2. Medical - PD Link 10-91
3. Medical - Breathing / Ill
4. Medical - No PT Contact E
5. Assist Civilian - Non-Med
6. Alarm System - Unnecessar
7. Elevator Emergency - Occu
8. Vehicle Accident - Other
9. Utility Emergency - Gas
10. Odor - Other Than Smoke
[ 57 others ]
9509(19.0%)
5741(11.5%)
5453(10.9%)
5013(10.0%)
4140(8.3%)
2845(5.7%)
1954(3.9%)
1543(3.1%)
1359(2.7%)
1337(2.7%)
11106(22.2%)
0 (0.0%)
INCIDENT_CLASSIFICATION_GROUP [character]
1. Medical Emergencies
2. Medical MFAs
3. NonMedical Emergencies
4. NonMedical MFAs
5. NonStructural Fires
6. Structural Fires
26824(53.6%)
208(0.4%)
19072(38.1%)
1680(3.4%)
703(1.4%)
1513(3.0%)
0 (0.0%)
DISPATCH_RESPONSE_SECONDS_QY [integer]
Mean (sd) : 40 (133.1)
min ≤ med ≤ max:
2 ≤ 19 ≤ 9023
IQR (CV) : 33 (3.3)
841 distinct values 0 (0.0%)
FIRST_ASSIGNMENT_DATETIME [character]
1. 09/06/2023 01:40:49 PM
2. 09/08/2023 02:43:53 PM
3. 09/20/2023 01:09:29 PM
4. 09/27/2023 02:04:41 PM
5. 09/05/2023 02:34:37 PM
6. 09/05/2023 03:38:43 PM
7. 09/05/2023 03:48:54 PM
8. 09/05/2023 03:56:07 PM
9. 09/05/2023 05:01:22 PM
10. 09/05/2023 07:13:08 PM
[ 49499 others ]
3(0.0%)
3(0.0%)
3(0.0%)
3(0.0%)
2(0.0%)
2(0.0%)
2(0.0%)
2(0.0%)
2(0.0%)
2(0.0%)
49976(100.0%)
0 (0.0%)
FIRST_ACTIVATION_DATETIME [character]
1. (Empty string)
2. 09/22/2023 02:07:47 PM
3. 09/07/2023 06:59:12 PM
4. 09/10/2023 06:28:10 PM
5. 09/17/2023 07:27:03 PM
6. 09/23/2023 08:01:29 PM
7. 09/25/2023 10:17:28 AM
8. 09/29/2023 08:16:05 AM
9. 09/05/2023 02:47:25 PM
10. 09/05/2023 03:00:12 PM
[ 49196 others ]
139(0.3%)
4(0.0%)
3(0.0%)
3(0.0%)
3(0.0%)
3(0.0%)
3(0.0%)
3(0.0%)
2(0.0%)
2(0.0%)
49835(99.7%)
0 (0.0%)
FIRST_ON_SCENE_DATETIME [character]
1. (Empty string)
2. 09/30/2023 04:01:43 PM
3. 09/05/2023 03:17:20 PM
4. 09/05/2023 03:27:35 PM
5. 09/05/2023 04:37:22 PM
6. 09/05/2023 04:38:47 PM
7. 09/05/2023 04:39:30 PM
8. 09/05/2023 05:44:27 PM
9. 09/05/2023 05:55:56 PM
10. 09/05/2023 08:59:49 PM
[ 35543 others ]
14112(28.2%)
3(0.0%)
2(0.0%)
2(0.0%)
2(0.0%)
2(0.0%)
2(0.0%)
2(0.0%)
2(0.0%)
2(0.0%)
35869(71.7%)
0 (0.0%)
INCIDENT_CLOSE_DATETIME [character]
1. 09/05/2023 06:13:06 PM
2. 09/10/2023 02:16:37 PM
3. 09/24/2023 04:10:06 PM
4. 09/25/2023 12:20:57 AM
5. 09/27/2023 04:38:25 PM
6. 09/29/2023 10:42:38 AM
7. 09/30/2023 06:06:40 PM
8. 09/05/2023 03:25:13 PM
9. 09/05/2023 04:08:06 PM
10. 09/05/2023 05:08:09 PM
[ 49399 others ]
3(0.0%)
3(0.0%)
3(0.0%)
3(0.0%)
3(0.0%)
3(0.0%)
3(0.0%)
2(0.0%)
2(0.0%)
2(0.0%)
49973(99.9%)
0 (0.0%)
VALID_DISPATCH_RSPNS_TIME_INDC [character] 1. N
50000(100.0%)
0 (0.0%)
VALID_INCIDENT_RSPNS_TIME_INDC [character]
1. N
2. Y
17036(34.1%)
32964(65.9%)
0 (0.0%)
INCIDENT_RESPONSE_SECONDS_QY [integer]
Mean (sd) : 380.7 (233.2)
min ≤ med ≤ max:
18 ≤ 334 ≤ 7130
IQR (CV) : 161 (0.6)
1496 distinct values 14112 (28.2%)
INCIDENT_TRAVEL_TM_SECONDS_QY [integer]
Mean (sd) : 340.5 (208.6)
min ≤ med ≤ max:
0 ≤ 301 ≤ 7122
IQR (CV) : 159 (0.6)
1382 distinct values 14112 (28.2%)
ENGINES_ASSIGNED_QUANTITY [integer]
Mean (sd) : 1.1 (0.8)
min ≤ med ≤ max:
0 ≤ 1 ≤ 19
IQR (CV) : 0 (0.7)
15 distinct values 62 (0.1%)
LADDERS_ASSIGNED_QUANTITY [integer]
Mean (sd) : 0.6 (0.8)
min ≤ med ≤ max:
0 ≤ 0 ≤ 15
IQR (CV) : 1 (1.4)
12 distinct values 62 (0.1%)
OTHER_UNITS_ASSIGNED_QUANTITY [integer]
Mean (sd) : 0.3 (0.8)
min ≤ med ≤ max:
0 ≤ 0 ≤ 32
IQR (CV) : 0 (2.8)
23 distinct values 62 (0.1%)

Generated by summarytools 1.0.1 (R version 4.2.1)
2024-01-08

#print(dfSummary(fire_data), method = 'render')

Now we rename all the columns in order to be smaller whenever we plot graphs.

fire_data <- fire_data %>%
            rename(id = STARFIRE_INCIDENT_ID, datetime = INCIDENT_DATETIME, al_borough = ALARM_BOX_BOROUGH, al_number = ALARM_BOX_NUMBER, 
                   al_location = ALARM_BOX_LOCATION, inc_borough = INCIDENT_BOROUGH, zipcode = ZIPCODE, pol_prec = POLICEPRECINCT,
                   city_con_dist = CITYCOUNCILDISTRICT, commu_dist = COMMUNITYDISTRICT, commu_sc_dist = COMMUNITYSCHOOLDISTRICT,
                   cong_dist = CONGRESSIONALDISTRICT, al_source_desc = ALARM_SOURCE_DESCRIPTION_TX, al_index_desc = ALARM_LEVEL_INDEX_DESCRIPTION,
                   highest_al_level = HIGHEST_ALARM_LEVEL, inc_class = INCIDENT_CLASSIFICATION, inc_class_group = INCIDENT_CLASSIFICATION_GROUP,
                   disp_resp_qy = DISPATCH_RESPONSE_SECONDS_QY, first_ass_datetime = FIRST_ASSIGNMENT_DATETIME,
                   first_act_datetime = FIRST_ACTIVATION_DATETIME,  first_onscene_datetime = FIRST_ON_SCENE_DATETIME,
                   inc_close_datetime = INCIDENT_CLOSE_DATETIME, disp_resp_time_indc = VALID_DISPATCH_RSPNS_TIME_INDC,
                   inc_resp_sec_indc = VALID_INCIDENT_RSPNS_TIME_INDC, inc_resp_sec_qy = INCIDENT_RESPONSE_SECONDS_QY,
                   inc_travel_sec_qy = INCIDENT_TRAVEL_TM_SECONDS_QY, engines_assigned = ENGINES_ASSIGNED_QUANTITY,
                   ladders_assigned = LADDERS_ASSIGNED_QUANTITY, others_units_assigned = OTHER_UNITS_ASSIGNED_QUANTITY)

As we can see from the summary there are many NA values, and many predictors that are as characters and not factors. In this step we will convert the characters predictors as factors merging the values that appear less in the dataset, so we do no have many values that have low frequency in our dataset.

In addition we will add he predictor for the day_number, a factorial predictor to indicate in the incident day is a week day or not is_weekend and a factorial predictor time_of_day that indicates the range of time whenever the incident happens, so Night (if the hour is between 0 and 6), Morning (if the hour is between 6 and 12), Afternoon (if the hour is between 12 and 18), Evening (if the hour is between 18 and 24).

Finally for convenience we convert the datetime that are currently a characters into POSIX type in the following format: "%m-%d-%Y %H:%M:%S"

# set factorial
fire_data$inc_borough <- as.factor(fire_data$inc_borough)
fire_data$al_borough <- as.factor(fire_data$al_borough)
fire_data$al_source_desc <- as.factor(fire_data$al_source_desc)
fire_data$al_index_desc <- as.factor(fire_data$al_index_desc)
fire_data$highest_al_level <- as.factor(fire_data$highest_al_level)

fire_data$disp_resp_time_indc <- as.factor(fire_data$disp_resp_time_indc)
levels(fire_data$disp_resp_time_indc)<- c("N", "Y")

fire_data$inc_resp_sec_indc <- as.factor(fire_data$inc_resp_sec_indc)
levels(fire_data$inc_resp_sec_indc)<- c("N", "Y")

fire_data$inc_class_group <- as.factor(fire_data$inc_class_group)
fire_data$inc_class <- as.factor(fire_data$inc_class)

We note that the maximum level of the time indicator is very high to be considered as seconds so we decided to scale the two indicators in minutes.

summary(fire_data %>% select(inc_resp_sec_qy, inc_travel_sec_qy))
##  inc_resp_sec_qy  inc_travel_sec_qy
##  Min.   :  18.0   Min.   :   0.0   
##  1st Qu.: 265.0   1st Qu.: 233.0   
##  Median : 334.0   Median : 301.0   
##  Mean   : 380.7   Mean   : 340.5   
##  3rd Qu.: 426.0   3rd Qu.: 392.0   
##  Max.   :7130.0   Max.   :7122.0   
##  NA's   :14112    NA's   :14112
fire_data$inc_resp_sec_qy <- fire_data$inc_resp_sec_qy / 60
fire_data$inc_travel_sec_qy <- fire_data$inc_travel_sec_qy / 60
fire_data <- fire_data %>% rename(inc_resp_min_qy = inc_resp_sec_qy, inc_travel_min_qy = inc_travel_sec_qy)

Here we create the day_number, time_of_day and is_weekend

#fire_data$datetime <- as.POSIXct(fire_data$datetime, format="%m/%d/%Y %H:%M:%S")
  
  # Process datetime column
fire_data$datetime <- mdy_hms(fire_data$datetime)
fire_data$inc_close_datetime <- mdy_hms(fire_data$inc_close_datetime)
  
# Create new columns for day number,  time of day and is weekend 
fire_data$day_number <- as.factor(day(fire_data$datetime))
  
fire_data$day_type <- as.factor(ifelse(weekdays(fire_data$datetime) %in% c("sabato", "domenica"), "Weekend", "Weekday"))
  
fire_data$ticket_time <- difftime(fire_data$inc_close_datetime, fire_data$datetime, units="mins")
  
fire_data$time_of_day <- cut(
    hour(fire_data$datetime),
    breaks = c(0, 6, 12, 18, 24),
    labels = c("Night", "Morning", "Afternoon", "Evening"),
    include.lowest = TRUE,
    right = TRUE
)
  
fire_data$datetime <- NULL
table(fire_data$day_number)
## 
##    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20 
## 1014 2059 2049 2006 2034 1989 2181 1971 1890 1836 1821 1856 1912 1779 1901 1870 
##   21   22   23   24   25   26   27   28   29   30 
## 1915 1859 1834 1796 1925 1879 1921 1961 2645 2097
table(fire_data$time_of_day)
## 
##     Night   Morning Afternoon   Evening 
##      8521     13270     16499     11710
table(fire_data$day_type)
## 
## Weekday Weekend 
##   36482   13518

Here we add a new predictor that is the sum of all the type of assigned units.

fire_data$total_assigned_unit <- fire_data$engines_assigned + fire_data$ladders_assigned + fire_data$others_units_assigned

Set the dates to POSIX data type

fire_data$first_onscene_datetime <- as.POSIXct(fire_data$first_onscene_datetime, format="%m/%d/%Y %H:%M:%S %p")
fire_data$first_ass_datetime <- as.POSIXct(fire_data$first_ass_datetime, format="%m/%d/%Y %H:%M:%S %p")
fire_data$first_act_datetime <- as.POSIXct(fire_data$first_act_datetime, format="%m/%d/%Y %H:%M:%S %p")
fire_data$inc_close_datetime <- as.POSIXct(fire_data$inc_close_datetime, format="%m/%d/%Y %H:%M:%S %p")

Rename the factor levels for the inc_borough predictor.

fire_data <- fire_data %>% mutate(inc_borough = recode_factor(inc_borough, "BRONX" = "Bronx", "BROOKLYN" = "Brooklyn", "MANHATTAN" = "Manhattan", "QUEENS" = "Queens", "RICHMOND / STATEN ISLAND" = "Staten Island"))

At this point we merge some possible value from factorial predictors to make the space of possible choice smaller.

Here we merge the following factorial values of highest_al_level: Second Alarm and Third Alarm into 2nd-3rd Alarm.

# highest_al_level
fire_data$highest_alarm_lev_new <- fire_data$highest_al_level
levels(fire_data$highest_alarm_lev_new) <- list(
  "All Hands Working" = "All Hands Working",
  "First Alarm" = "First Alarm", 
  "2nd-3rd Alarm" = c("Second Alarm", "Third Alarm")
)

print(ctable(fire_data$highest_al_level, fire_data$highest_alarm_lev_new), method = 'render')

Cross-Tabulation, Row Proportions

highest_al_level * highest_alarm_lev_new

Data Frame: fire_data
highest_alarm_lev_new
highest_al_level All Hands
Working
First Alarm 2nd-3rd
Alarm
Total
All Hands Working 100 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.00% ) 100 ( 100.0% )
First Alarm 0 ( 0.0% ) 49891 ( 100.0% ) 0 ( 0.00% ) 49891 ( 100.0% )
Second Alarm 0 ( 0.0% ) 0 ( 0.0% ) 8 ( 100.00% ) 8 ( 100.0% )
Third Alarm 0 ( 0.0% ) 0 ( 0.0% ) 1 ( 100.00% ) 1 ( 100.0% )
Total 100 ( 0.2% ) 49891 ( 99.8% ) 9 ( 0.02% ) 50000 ( 100.0% )

Generated by summarytools 1.0.1 (R version 4.2.1)
2024-01-08

fire_data$highest_al_level <- fire_data$highest_alarm_lev_new
fire_data$highest_alarm_lev_new <- NULL

Here we merge the following factorial values of al_index_desc: Second Alarm, Third Alarm, 7-5 (All Hands Alarm), 10-76 & 10-77 Signal (Notification Hi-Rise Fire) and 10-75 Signal (Request for all hands alarm) into Others.

# al_index_desc
fire_data$alarm_level_idx_new <- fire_data$al_index_desc
levels(fire_data$alarm_level_idx_new) <- list(
  "DEFAULT RECORD" = "DEFAULT RECORD",
  "Initial Alarm" = "Initial Alarm", 
  "Others" = c("Second Alarm", "Third Alarm", "7-5 (All Hands Alarm)", 
               "10-76 & 10-77 Signal (Notification Hi-Rise Fire)",
               "10-75 Signal (Request for all hands alarm)")
)

print(ctable(fire_data$al_index_desc, fire_data$alarm_level_idx_new), method = 'render')

Cross-Tabulation, Row Proportions

al_index_desc * alarm_level_idx_new

Data Frame: fire_data
alarm_level_idx_new
al_index_desc DEFAULT
RECORD
Initial
Alarm
Others Total
10-75 Signal (Request for all hands alarm) 0 ( 0.0% ) 0 ( 0.0% ) 13 ( 100.0% ) 13 ( 100.0% )
10-76 & 10-77 Signal (Notification Hi-Rise Fire) 0 ( 0.0% ) 0 ( 0.0% ) 3 ( 100.0% ) 3 ( 100.0% )
7-5 (All Hands Alarm) 0 ( 0.0% ) 0 ( 0.0% ) 100 ( 100.0% ) 100 ( 100.0% )
DEFAULT RECORD 17313 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 17313 ( 100.0% )
Initial Alarm 0 ( 0.0% ) 32562 ( 100.0% ) 0 ( 0.0% ) 32562 ( 100.0% )
Second Alarm 0 ( 0.0% ) 0 ( 0.0% ) 8 ( 100.0% ) 8 ( 100.0% )
Third Alarm 0 ( 0.0% ) 0 ( 0.0% ) 1 ( 100.0% ) 1 ( 100.0% )
Total 17313 ( 34.6% ) 32562 ( 65.1% ) 125 ( 0.2% ) 50000 ( 100.0% )

Generated by summarytools 1.0.1 (R version 4.2.1)
2024-01-08

fire_data$al_index_desc <- fire_data$alarm_level_idx_new
fire_data$alarm_level_idx_new <- NULL

Here we merge the following factorial values of al_source_desc: 911, 911TEXT, VERBAL, BARS, ERS, ERS-NC and SOL into Others.

fire_data$alarm_source_desc_new <- fire_data$al_source_desc
levels(fire_data$alarm_source_desc_new) <- list(
  "PHONE" = "PHONE",
  "EMS" = "EMS",
  "EMS-911" = "EMS-911",
  "CLASS-3" = "CLASS-3",
  "Others" = c("911", "911TEXT", "VERBAL", "BARS", "ERS", "ERS-NC", "SOL")
)

print(ctable(fire_data$al_source_desc, fire_data$alarm_source_desc_new), method = 'render')

Cross-Tabulation, Row Proportions

al_source_desc * alarm_source_desc_new

Data Frame: fire_data
alarm_source_desc_new
al_source_desc PHONE EMS EMS-911 CLASS-3 Others Total
911 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 302 ( 100.0% ) 302 ( 100.0% )
911TEXT 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 14 ( 100.0% ) 14 ( 100.0% )
BARS 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1 ( 100.0% ) 1 ( 100.0% )
CLASS-3 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 5025 ( 100.0% ) 0 ( 0.0% ) 5025 ( 100.0% )
EMS 0 ( 0.0% ) 17178 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 17178 ( 100.0% )
EMS-911 0 ( 0.0% ) 0 ( 0.0% ) 10520 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 10520 ( 100.0% )
ERS 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 777 ( 100.0% ) 777 ( 100.0% )
ERS-NC 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1 ( 100.0% ) 1 ( 100.0% )
PHONE 15146 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 15146 ( 100.0% )
SOL 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 5 ( 100.0% ) 5 ( 100.0% )
VERBAL 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1031 ( 100.0% ) 1031 ( 100.0% )
Total 15146 ( 30.3% ) 17178 ( 34.4% ) 10520 ( 21.0% ) 5025 ( 10.1% ) 2131 ( 4.3% ) 50000 ( 100.0% )

Generated by summarytools 1.0.1 (R version 4.2.1)
2024-01-08

fire_data$al_source_desc <- fire_data$alarm_source_desc_new
fire_data$alarm_source_desc_new <- NULL

Visualise angain the dataset summary to see the changes.

print(dfSummary(fire_data, 
                varnumbers   = FALSE, 
                valid.col    = FALSE, 
                graph.magnif = 0.76),
                method = 'render')

Data Frame Summary

fire_data

Dimensions: 50000 x 33
Duplicates: 0
Variable Stats / Values Freqs (% of Valid) Graph Missing
id [character]
1. 230905-B0042-001-1051
2. 230905-B0053-001-0760
3. 230905-B0053-002-0910
4. 230905-B0081-001-1137
5. 230905-B0106-002-0632
6. 230905-B0132-001-0713
7. 230905-B0147-001-0967
8. 230905-B0160-001-1125
9. 230905-B0163-001-1026
10. 230905-B0165-001-0778
[ 49990 others ]
1(0.0%)
1(0.0%)
1(0.0%)
1(0.0%)
1(0.0%)
1(0.0%)
1(0.0%)
1(0.0%)
1(0.0%)
1(0.0%)
49990(100.0%)
0 (0.0%)
al_borough [factor]
1. BRONX
2. BROOKLYN
3. MANHATTAN
4. QUEENS
5. RICHMOND / STATEN ISLAND
10973(21.9%)
13980(28.0%)
12890(25.8%)
9879(19.8%)
2278(4.6%)
0 (0.0%)
al_number [integer]
Mean (sd) : 2930.3 (2446.5)
min ≤ med ≤ max:
10 ≤ 2275 ≤ 9933
IQR (CV) : 2772 (0.8)
7411 distinct values 0 (0.0%)
al_location [character]
1. 8 AVE & W 155 ST
2. 10 RICHMAN PLZ/SEDGWICK A
3. AMSTERDAM AVE & LA SALLE
4. 3 AVE & E 143 ST
5. WASHINGTON AVE & E 170 ST
6. FDR DR & E 6 ST
7. CONCOURSE VILLAGE E & E 1
8. PARK AVE & E 158 ST
9. UNION TPK & WINCHESTER BL
10. 8 AVE & W 33 ST
[ 12203 others ]
85(0.2%)
75(0.1%)
50(0.1%)
48(0.1%)
48(0.1%)
45(0.1%)
44(0.1%)
40(0.1%)
40(0.1%)
39(0.1%)
49486(99.0%)
0 (0.0%)
inc_borough [factor]
1. Bronx
2. Brooklyn
3. Manhattan
4. Queens
5. Staten Island
10973(21.9%)
13980(28.0%)
12890(25.8%)
9879(19.8%)
2278(4.6%)
0 (0.0%)
zipcode [integer]
Mean (sd) : 10737.9 (551.8)
min ≤ med ≤ max:
10000 ≤ 10472 ≤ 11697
IQR (CV) : 1098 (0.1)
217 distinct values 3181 (6.4%)
pol_prec [integer]
Mean (sd) : 62.3 (34.8)
min ≤ med ≤ max:
1 ≤ 61 ≤ 123
IQR (CV) : 56 (0.6)
77 distinct values 3180 (6.4%)
city_con_dist [integer]
Mean (sd) : 23.1 (15.1)
min ≤ med ≤ max:
1 ≤ 21 ≤ 51
IQR (CV) : 27 (0.7)
51 distinct values 3180 (6.4%)
commu_dist [integer]
Mean (sd) : 262.9 (119.4)
min ≤ med ≤ max:
101 ≤ 302 ≤ 595
IQR (CV) : 206 (0.5)
70 distinct values 3180 (6.4%)
commu_sc_dist [integer]
Mean (sd) : 14.8 (9.7)
min ≤ med ≤ max:
1 ≤ 13 ≤ 32
IQR (CV) : 18 (0.7)
32 distinct values 3182 (6.4%)
cong_dist [integer]
Mean (sd) : 10.4 (3.3)
min ≤ med ≤ max:
3 ≤ 11 ≤ 16
IQR (CV) : 5 (0.3)
13 distinct values 3180 (6.4%)
al_source_desc [factor]
1. PHONE
2. EMS
3. EMS-911
4. CLASS-3
5. Others
15146(30.3%)
17178(34.4%)
10520(21.0%)
5025(10.1%)
2131(4.3%)
0 (0.0%)
al_index_desc [factor]
1. DEFAULT RECORD
2. Initial Alarm
3. Others
17313(34.6%)
32562(65.1%)
125(0.2%)
0 (0.0%)
highest_al_level [factor]
1. All Hands Working
2. First Alarm
3. 2nd-3rd Alarm
100(0.2%)
49891(99.8%)
9(0.0%)
0 (0.0%)
inc_class [factor]
1. Abandoned Derelict Vehicl
2. Alarm System - Defective
3. Alarm System - Testing
4. Alarm System - Unnecessar
5. Assist Civilian - Non-Med
6. Automobile Fire
7. Brush Fire
8. Carbon Monoxide - Code 1
9. Carbon Monoxide - Code 2
10. Carbon Monoxide - Code 3
[ 57 others ]
7(0.0%)
387(0.8%)
728(1.5%)
2845(5.7%)
4140(8.3%)
106(0.2%)
27(0.1%)
813(1.6%)
133(0.3%)
92(0.2%)
40722(81.4%)
0 (0.0%)
inc_class_group [factor]
1. Medical Emergencies
2. Medical MFAs
3. NonMedical Emergencies
4. NonMedical MFAs
5. NonStructural Fires
6. Structural Fires
26824(53.6%)
208(0.4%)
19072(38.1%)
1680(3.4%)
703(1.4%)
1513(3.0%)
0 (0.0%)
disp_resp_qy [integer]
Mean (sd) : 40 (133.1)
min ≤ med ≤ max:
2 ≤ 19 ≤ 9023
IQR (CV) : 33 (3.3)
841 distinct values 0 (0.0%)
first_ass_datetime [POSIXct, POSIXt]
min : 2023-09-05 02:19:12
med : 2023-09-18 03:57:44
max : 2023-10-01 12:05:02
range : 26d 9H 45M 50S
49020 distinct values 0 (0.0%)
first_act_datetime [POSIXct, POSIXt]
min : 2023-09-05 02:19:26
med : 2023-09-18 03:57:33
max : 2023-10-01 12:05:16
range : 26d 9H 45M 50S
48730 distinct values 139 (0.3%)
first_onscene_datetime [POSIXct, POSIXt]
min : 2023-09-05 02:23:21
med : 2023-09-18 05:35:57
max : 2023-10-01 12:09:41
range : 26d 9H 46M 20S
35293 distinct values 14112 (28.2%)
inc_close_datetime [POSIXct, POSIXt]
min : 2023-09-05 14:25:05
med : 2023-09-18 08:35:24
max : 2023-10-01 00:58:42
range : 25d 10H 33M 37S
49409 distinct values 0 (0.0%)
disp_resp_time_indc [factor]
1. N
2. Y
50000(100.0%)
0(0.0%)
0 (0.0%)
inc_resp_sec_indc [factor]
1. N
2. Y
17036(34.1%)
32964(65.9%)
0 (0.0%)
inc_resp_min_qy [numeric]
Mean (sd) : 6.3 (3.9)
min ≤ med ≤ max:
0.3 ≤ 5.6 ≤ 118.8
IQR (CV) : 2.7 (0.6)
1496 distinct values 14112 (28.2%)
inc_travel_min_qy [numeric]
Mean (sd) : 5.7 (3.5)
min ≤ med ≤ max:
0 ≤ 5 ≤ 118.7
IQR (CV) : 2.6 (0.6)
1382 distinct values 14112 (28.2%)
engines_assigned [integer]
Mean (sd) : 1.1 (0.8)
min ≤ med ≤ max:
0 ≤ 1 ≤ 19
IQR (CV) : 0 (0.7)
15 distinct values 62 (0.1%)
ladders_assigned [integer]
Mean (sd) : 0.6 (0.8)
min ≤ med ≤ max:
0 ≤ 0 ≤ 15
IQR (CV) : 1 (1.4)
12 distinct values 62 (0.1%)
others_units_assigned [integer]
Mean (sd) : 0.3 (0.8)
min ≤ med ≤ max:
0 ≤ 0 ≤ 32
IQR (CV) : 0 (2.8)
23 distinct values 62 (0.1%)
day_number [factor]
1. 5
2. 6
3. 7
4. 8
5. 9
6. 10
7. 11
8. 12
9. 13
10. 14
[ 16 others ]
1014(2.0%)
2059(4.1%)
2049(4.1%)
2006(4.0%)
2034(4.1%)
1989(4.0%)
2181(4.4%)
1971(3.9%)
1890(3.8%)
1836(3.7%)
30971(61.9%)
0 (0.0%)
day_type [factor]
1. Weekday
2. Weekend
36482(73.0%)
13518(27.0%)
0 (0.0%)
ticket_time [difftime]
min : 0.316666666666667
med : 14.6
max : 2625.01666666667
units : mins
4870 distinct values 0 (0.0%)
time_of_day [factor]
1. Night
2. Morning
3. Afternoon
4. Evening
8521(17.0%)
13270(26.5%)
16499(33.0%)
11710(23.4%)
0 (0.0%)
total_assigned_unit [integer]
Mean (sd) : 2 (2)
min ≤ med ≤ max:
1 ≤ 1 ≤ 66
IQR (CV) : 1 (1)
35 distinct values 62 (0.1%)

Generated by summarytools 1.0.1 (R version 4.2.1)
2024-01-08

The next step is to deal with NA value and delete some un-useful predictors.

First of all we saw the possibility that al_borough and inc_borough are actually the same column.

identical(fire_data$al_borough, fire_data$inc_borough)
## [1] FALSE

The column `al_borough and inc_borough have the same sequence of values, so we can delete one of the two.

fire_data <- fire_data %>% select(-c(al_borough))

Then we say that all observation in the dataset have the disp_resp_time_indc equal to N, let’s check again and in affermative case we can delete both columns.

summary(fire_data$disp_resp_time_indc)
##     N     Y 
## 50000     0

All our observations have non valid disp_resp_time_indc so we can delete both the column indicator and the respective column quantity inc_travel_min_qy

fire_data <- fire_data %>% select(-c(disp_resp_time_indc, inc_travel_min_qy))

Now we do a quick check also on the other indicator variable inc_resp_sec_indc

summary(fire_data$inc_resp_sec_indc)
##     N     Y 
## 17036 32964

But here we have some observations with valid inc_resp_sec_indc, and we will consider only the valid one deleting the one that has a non valid attribute.

However before doing that let’s be sure that the distribution of inc_resp_min_qy around the borough.

ggplot(data=fire_data %>% group_by(inc_borough, inc_resp_sec_indc) %>% summarise(incident_number = n()), 
       aes(x=inc_borough, y=incident_number, fill=inc_resp_sec_indc)) +
  geom_bar(stat="identity", position=position_dodge()) +
  geom_text(aes(label=incident_number), vjust=1.6, color="white",
            position = position_dodge(0.9), size=3.5) +
  scale_fill_brewer(palette="Paired") +
  labs(title = "Incident Count - Borouh - Valid Response Time in Minutes", x = "Borough", y = "Incident Number", fill = "Valid Response\n Time in Minutes") +
  theme_gray()
## `summarise()` has grouped output by 'inc_borough'. You can override using the
## `.groups` argument.

And to the rateo of valid inc_resp_sec_indc in each borough is:

print(fire_data %>% group_by(inc_borough, inc_resp_sec_indc) %>% summarise(incident_number = n()) %>% mutate(ratio=incident_number/sum(incident_number))) %>% filter(inc_resp_sec_indc == "Y")
## `summarise()` has grouped output by 'inc_borough'. You can override using the
## `.groups` argument.
## # A tibble: 10 Ă— 4
## # Groups:   inc_borough [5]
##    inc_borough   inc_resp_sec_indc incident_number ratio
##    <fct>         <fct>                       <int> <dbl>
##  1 Bronx         N                            4076 0.371
##  2 Bronx         Y                            6897 0.629
##  3 Brooklyn      N                            4224 0.302
##  4 Brooklyn      Y                            9756 0.698
##  5 Manhattan     N                            4981 0.386
##  6 Manhattan     Y                            7909 0.614
##  7 Queens        N                            3165 0.320
##  8 Queens        Y                            6714 0.680
##  9 Staten Island N                             590 0.259
## 10 Staten Island Y                            1688 0.741
## # A tibble: 5 Ă— 4
## # Groups:   inc_borough [5]
##   inc_borough   inc_resp_sec_indc incident_number ratio
##   <fct>         <fct>                       <int> <dbl>
## 1 Bronx         Y                            6897 0.629
## 2 Brooklyn      Y                            9756 0.698
## 3 Manhattan     Y                            7909 0.614
## 4 Queens        Y                            6714 0.680
## 5 Staten Island Y                            1688 0.741
ggplot(fire_data, aes(total_assigned_unit, inc_resp_min_qy)) + 
  geom_point(aes(colour = inc_resp_sec_indc))+
   labs(title = "Total Assigned Units - Response Time In Minutes", x = "Total Assigned Units", y = "Response Time In Minutes", colour = "Valid Response\n Time in Minutes") +
  theme_gray()
## Warning: Removed 14116 rows containing missing values (`geom_point()`).

ggplot(fire_data %>% filter(inc_resp_sec_indc == "N")
            , aes(total_assigned_unit, inc_resp_min_qy)) + 
  geom_point(aes(colour = inc_class_group)) +
  labs(title = "Total Assigned Units - Response Time In Minutes - Incidnet Class Group", x = "Total Assigned Units", y = "Response Time In Minutes", colour = "Incident Class Groups") +
  theme_gray()
## Warning: Removed 14112 rows containing missing values (`geom_point()`).

print(fire_data %>% 
        filter(inc_resp_sec_indc == "N", inc_class_group == "Medical Emergencies", total_assigned_unit == 1) %>%
        group_by(inc_borough) %>%
        summarise(incident_number = n()))
## # A tibble: 5 Ă— 2
##   inc_borough   incident_number
##   <fct>                   <int>
## 1 Bronx                    3645
## 2 Brooklyn                 3694
## 3 Manhattan                4461
## 4 Queens                   2677
## 5 Staten Island             473
print(fire_data %>% 
        filter(inc_resp_sec_indc == "N", inc_class_group == "Medical Emergencies", total_assigned_unit != 1) %>%
        group_by(inc_borough) %>%
        summarise(incident_number = n()))
## # A tibble: 5 Ă— 2
##   inc_borough   incident_number
##   <fct>                   <int>
## 1 Bronx                      20
## 2 Brooklyn                   29
## 3 Manhattan                  32
## 4 Queens                     27
## 5 Staten Island               5
print(fire_data %>% 
        filter(inc_resp_sec_indc == "N", inc_class_group == "Medical Emergencies", total_assigned_unit == 1) %>%
        group_by(inc_class) %>%
        summarise(incident_number = n()))
## # A tibble: 7 Ă— 2
##   inc_class                              incident_number
##   <fct>                                            <int>
## 1 Medical - Assist Civilian                          122
## 2 Medical - Breathing / Ill or Sick                  651
## 3 Medical - EMS Link 10-91                          8356
## 4 Medical - No PT Contact EMS is Onscene             695
## 5 Medical - PD Link 10-91                           4833
## 6 Medical - Serious Life Threatening                 195
## 7 Medical - Victim Deceased                           98
ggplot(data=fire_data %>% 
        filter(inc_resp_sec_indc == "N", inc_class_group == "Medical Emergencies", total_assigned_unit == 1) %>%
        group_by(inc_class, inc_borough) %>%
        summarise(incident_number = n()), 
       aes(x=inc_borough, y=incident_number, fill=inc_class)) + geom_bar(stat="identity", position=position_dodge()) +
        geom_text(aes(label=incident_number), vjust=1.6, color="black",
                  position = position_dodge(0.9), size=3) +
        #scale_fill_brewer(palette="Paired") +
        labs(title = "Borough - Incident Counts - Incident Class Group", x = "Borough", y = "Incident Counts", fill = "Incident Class Group") +
        theme_grey()
## `summarise()` has grouped output by 'inc_class'. You can override using the
## `.groups` argument.

We have concluded that the observations having inc_resp_sec_indc == "N", inc_class_group == "Medical Emergencies", total_assigned_unit == 1 are mostly identified as Medical - EMS Link 10-91 and Medical - PD Link 10-91.

  1. 10-91 Medical Emergency EMS - Fire Unit Not Required - To be transmitted through borough dispatcher by the responding unit when the fire Unit is canceled enroute due to EMS on scene, or EMS downgrades the job to a segment that does not require a Fire Unit response. Note: This signal shall be used only for medical emergency incidents. EMS stands for Emergency Medical Services.

  2. 10-91 Medical Emergency PD - Fire Unit Not Required - To be transmitted through borough dispatcher by the responding unit when the fire Unit is canceled enroute due to PD on scene, or PD downgrades the job to a segment that does not require a Fire Unit response. Note: This signal shall be used only for medical emergency incidents. PD stands for Police Department.

For and additional proof we can see also the relation with the al_source_desc.

ggplot(fire_data %>% filter(inc_resp_sec_indc == "N")
            , aes(total_assigned_unit, inc_resp_min_qy)) +
  geom_point(aes(colour = factor(al_source_desc))) +
  labs(title = "Total Assigned Units - Response Time In Minutes - Alarm Source", x = "Total Assigned Units", y = "Response Time In Minutes", colour = "Alarm Source") +
  theme_gray()
## Warning: Removed 14112 rows containing missing values (`geom_point()`).

And as expected the higher number of inc_resp_min_qy is from the al_source_desc equal to EMS and EMS-911. Whereas for the higher number of total assigned units and lower response time the majority of incident were reported by phone call.

Now we can look for the NonMedical Emergencies.

print(fire_data %>% 
        filter(inc_resp_sec_indc == "N", inc_class_group == "NonMedical Emergencies") %>%
        group_by(inc_class) %>%
        summarise(incident_number = n()))
## # A tibble: 24 Ă— 2
##    inc_class                                         incident_number
##    <fct>                                                       <int>
##  1 Alarm System - Defective                                       10
##  2 Alarm System - Testing                                         22
##  3 Alarm System - Unnecessary                                    110
##  4 Assist Civilian - Non-Medical                                 828
##  5 Carbon Monoxide - Code 1 - Investigation                       25
##  6 Carbon Monoxide - Code 2 - Incident (1-9 ppm)                   4
##  7 Carbon Monoxide - Code 3 - Emergency (over 9 ppm)               4
##  8 Defective Oil Burner                                            5
##  9 Downed Tree                                                    28
## 10 Elevator Emergency - Occupied                                 104
## # ℹ 14 more rows
ggplot(data=fire_data %>% 
          filter(inc_resp_sec_indc == "N", inc_class_group == "NonMedical Emergencies", inc_class == "Assist Civilian - Non-Medical") %>%
          group_by(inc_borough) %>%
          summarise(incident_number = n()), 
        aes(x=inc_borough, y=incident_number)) + 
      geom_bar(stat="identity", position=position_dodge()) +
      geom_text(aes(label=incident_number), vjust=1.6, color="white", position = position_dodge(0.9), size=3.5) +
      #scale_fill_brewer(palette="Paired") +
      labs(title = "Incident Count - Borouh - Valid Response Time in Second", x = "Borough", y = "Incident Count") +
      theme_minimal()

Where as the majority of non valid inc_resp_sec_indc that are Non-Medical Emergency are from the incident class equal to Assist Civilian - Non-Medical.

For stake of consistency we will consider only the valid observations that have inc_resp_sec_indc == "Y".

fire_data <- fire_data %>% filter(inc_resp_sec_indc == "Y")
dim(fire_data)
## [1] 32964    30

Now we want to know how many incident_class are summarized in each incident_class_group.

print(ctable(fire_data$inc_class, fire_data$inc_class_group), method = 'render')

Cross-Tabulation, Row Proportions

inc_class * inc_class_group

Data Frame: fire_data
inc_class_group
inc_class Medical
Emergencies
Medical MFAs NonMedical
Emergencies
NonMedical
MFAs
NonStructura
l Fires
Structural
Fires
Total
Abandoned Derelict Vehicle Fire 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 6 ( 100.0% ) 0 ( 0.0% ) 6 ( 100.0% )
Alarm System - Defective 0 ( 0.0% ) 0 ( 0.0% ) 377 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 377 ( 100.0% )
Alarm System - Testing 0 ( 0.0% ) 0 ( 0.0% ) 706 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 706 ( 100.0% )
Alarm System - Unnecessary 0 ( 0.0% ) 0 ( 0.0% ) 2735 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2735 ( 100.0% )
Assist Civilian - Non-Medical 0 ( 0.0% ) 0 ( 0.0% ) 3312 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 3312 ( 100.0% )
Automobile Fire 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 101 ( 100.0% ) 0 ( 0.0% ) 101 ( 100.0% )
Brush Fire 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 24 ( 100.0% ) 0 ( 0.0% ) 24 ( 100.0% )
Carbon Monoxide - Code 1 - Investigation 0 ( 0.0% ) 0 ( 0.0% ) 788 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 788 ( 100.0% )
Carbon Monoxide - Code 2 - Incident (1-9 ppm) 0 ( 0.0% ) 0 ( 0.0% ) 129 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 129 ( 100.0% )
Carbon Monoxide - Code 3 - Emergency (over 9 ppm) 0 ( 0.0% ) 0 ( 0.0% ) 88 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 88 ( 100.0% )
Carbon Monoxide - Code 4 - No Detector Activation 0 ( 0.0% ) 0 ( 0.0% ) 8 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 8 ( 100.0% )
Church Fire 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 10 ( 100.0% ) 10 ( 100.0% )
Defective Oil Burner 0 ( 0.0% ) 0 ( 0.0% ) 34 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 34 ( 100.0% )
Demolition Debris or Rubbish Fire 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 272 ( 100.0% ) 0 ( 0.0% ) 272 ( 100.0% )
Downed Tree 0 ( 0.0% ) 0 ( 0.0% ) 280 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 280 ( 100.0% )
Elevator Emergency - Occupied 0 ( 0.0% ) 0 ( 0.0% ) 1850 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1850 ( 100.0% )
Elevator Emergency - Unoccupied 0 ( 0.0% ) 0 ( 0.0% ) 708 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 708 ( 100.0% )
Factory Fire 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1 ( 100.0% ) 1 ( 100.0% )
Hospital Fire 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 18 ( 100.0% ) 18 ( 100.0% )
Manhole Fire - Blown Cover 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 9 ( 100.0% ) 0 ( 0.0% ) 9 ( 100.0% )
Manhole Fire - Other 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 55 ( 100.0% ) 0 ( 0.0% ) 55 ( 100.0% )
Manhole Fire - Seeping Smoke 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 104 ( 100.0% ) 0 ( 0.0% ) 104 ( 100.0% )
Maritime Emergency 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% )
Maritime Fire 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% )
Medical - Assist Civilian 27 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 27 ( 100.0% )
Medical - Breathing / Ill or Sick 4779 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 4779 ( 100.0% )
Medical - EMS Link 10-91 1096 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1096 ( 100.0% )
Medical - No PT Contact EMS is Onscene 4285 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 4285 ( 100.0% )
Medical - PD Link 10-91 868 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 868 ( 100.0% )
Medical - Serious Life Threatening 366 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 366 ( 100.0% )
Medical - Victim Deceased 287 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 287 ( 100.0% )
Medical MFA - EMS Link 0 ( 0.0% ) 87 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 87 ( 100.0% )
Medical MFA - PD Link 0 ( 0.0% ) 77 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 77 ( 100.0% )
Multiple Dwelling 'A' - Compactor fire 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 4 ( 100.0% ) 4 ( 100.0% )
Multiple Dwelling 'A' - Food on the stove fire 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 519 ( 100.0% ) 519 ( 100.0% )
Multiple Dwelling 'A' - Other fire 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 168 ( 100.0% ) 168 ( 100.0% )
Multiple Dwelling 'B' Fire 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 85 ( 100.0% ) 85 ( 100.0% )
Non-Medical 10-91 (Unnecessary Alarm) 0 ( 0.0% ) 0 ( 0.0% ) 102 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 102 ( 100.0% )
Non-Medical MFA - ERS 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 586 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 586 ( 100.0% )
Non-Medical MFA - ERS No Contact 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1 ( 100.0% )
Non-Medical MFA - Phone 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 701 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 701 ( 100.0% )
Non-Medical MFA - Private Fire Alarm 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 223 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 223 ( 100.0% )
Non-Medical MFA - Verbal 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 7 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 7 ( 100.0% )
Odor - Other Smoke 0 ( 0.0% ) 0 ( 0.0% ) 166 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 166 ( 100.0% )
Odor - Other Than Smoke 0 ( 0.0% ) 0 ( 0.0% ) 1317 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1317 ( 100.0% )
Other Commercial Building Fire 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 184 ( 100.0% ) 184 ( 100.0% )
Other Public Building Fire 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 4 ( 100.0% ) 4 ( 100.0% )
Other Transportation Fire 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 14 ( 100.0% ) 0 ( 0.0% ) 14 ( 100.0% )
Private Dwelling Fire 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 412 ( 100.0% ) 412 ( 100.0% )
Remove Civilian - Non-Fire 0 ( 0.0% ) 0 ( 0.0% ) 27 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 27 ( 100.0% )
School Fire 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 31 ( 100.0% ) 31 ( 100.0% )
Sprinkler System - Activated 0 ( 0.0% ) 0 ( 0.0% ) 6 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 6 ( 100.0% )
Sprinkler System - Malfunction 0 ( 0.0% ) 0 ( 0.0% ) 41 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 41 ( 100.0% )
Sprinkler System - Working on System 0 ( 0.0% ) 0 ( 0.0% ) 28 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 28 ( 100.0% )
Store Fire 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 9 ( 100.0% ) 9 ( 100.0% )
Transit System - NonStructural 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 59 ( 100.0% ) 0 ( 0.0% ) 59 ( 100.0% )
Transit System - Structural 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1 ( 100.0% ) 1 ( 100.0% )
Transit System Emergency 0 ( 0.0% ) 0 ( 0.0% ) 18 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 18 ( 100.0% )
Undefined Emergency 0 ( 0.0% ) 0 ( 0.0% ) 71 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 71 ( 100.0% )
Under Contruction / Vacant Fire 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1 ( 100.0% ) 1 ( 100.0% )
Utility Emergency - Electric 0 ( 0.0% ) 0 ( 0.0% ) 595 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 595 ( 100.0% )
Utility Emergency - Gas 0 ( 0.0% ) 0 ( 0.0% ) 1335 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1335 ( 100.0% )
Utility Emergency - Steam 0 ( 0.0% ) 0 ( 0.0% ) 137 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 137 ( 100.0% )
Utility Emergency - Undefined 0 ( 0.0% ) 0 ( 0.0% ) 4 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 4 ( 100.0% )
Utility Emergency - Water 0 ( 0.0% ) 0 ( 0.0% ) 1157 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1157 ( 100.0% )
Vehicle Accident - Other 0 ( 0.0% ) 0 ( 0.0% ) 1443 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1443 ( 100.0% )
Vehicle Accident - With Extrication 0 ( 0.0% ) 0 ( 0.0% ) 21 ( 100.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 21 ( 100.0% )
Total 11708 ( 35.5% ) 164 ( 0.5% ) 17483 ( 53.0% ) 1518 ( 4.6% ) 644 ( 2.0% ) 1447 ( 4.4% ) 32964 ( 100.0% )

Generated by summarytools 1.0.1 (R version 4.2.1)
2024-01-08

As we can see from the upper table all the inc_class_group have a unique set of values.

At this point to be more clear we display each main class with each respective sub-class.

for (variable in levels(fire_data$inc_class_group)) {
  non_zero_table <- table(subset(fire_data, inc_class_group == variable)$inc_class)
  cat(variable, "\n")
  print(non_zero_table[non_zero_table != 0])
  cat("\n")
}
## Medical Emergencies 
## 
##              Medical - Assist Civilian      Medical - Breathing / Ill or Sick 
##                                     27                                   4779 
##               Medical - EMS Link 10-91 Medical - No PT Contact EMS is Onscene 
##                                   1096                                   4285 
##                Medical - PD Link 10-91     Medical - Serious Life Threatening 
##                                    868                                    366 
##              Medical - Victim Deceased 
##                                    287 
## 
## Medical MFAs 
## 
## Medical MFA - EMS Link  Medical MFA - PD Link 
##                     87                     77 
## 
## NonMedical Emergencies 
## 
##                          Alarm System - Defective 
##                                               377 
##                            Alarm System - Testing 
##                                               706 
##                        Alarm System - Unnecessary 
##                                              2735 
##                     Assist Civilian - Non-Medical 
##                                              3312 
##          Carbon Monoxide - Code 1 - Investigation 
##                                               788 
##     Carbon Monoxide - Code 2 - Incident (1-9 ppm) 
##                                               129 
## Carbon Monoxide - Code 3 - Emergency (over 9 ppm) 
##                                                88 
## Carbon Monoxide - Code 4 - No Detector Activation 
##                                                 8 
##                              Defective Oil Burner 
##                                                34 
##                                       Downed Tree 
##                                               280 
##                     Elevator Emergency - Occupied 
##                                              1850 
##                   Elevator Emergency - Unoccupied 
##                                               708 
##             Non-Medical 10-91 (Unnecessary Alarm) 
##                                               102 
##                                Odor - Other Smoke 
##                                               166 
##                           Odor - Other Than Smoke 
##                                              1317 
##                        Remove Civilian - Non-Fire 
##                                                27 
##                      Sprinkler System - Activated 
##                                                 6 
##                    Sprinkler System - Malfunction 
##                                                41 
##              Sprinkler System - Working on System 
##                                                28 
##                          Transit System Emergency 
##                                                18 
##                               Undefined Emergency 
##                                                71 
##                      Utility Emergency - Electric 
##                                               595 
##                           Utility Emergency - Gas 
##                                              1335 
##                         Utility Emergency - Steam 
##                                               137 
##                     Utility Emergency - Undefined 
##                                                 4 
##                         Utility Emergency - Water 
##                                              1157 
##                          Vehicle Accident - Other 
##                                              1443 
##               Vehicle Accident - With Extrication 
##                                                21 
## 
## NonMedical MFAs 
## 
##                Non-Medical MFA - ERS     Non-Medical MFA - ERS No Contact 
##                                  586                                    1 
##              Non-Medical MFA - Phone Non-Medical MFA - Private Fire Alarm 
##                                  701                                  223 
##             Non-Medical MFA - Verbal 
##                                    7 
## 
## NonStructural Fires 
## 
##   Abandoned Derelict Vehicle Fire                   Automobile Fire 
##                                 6                               101 
##                        Brush Fire Demolition Debris or Rubbish Fire 
##                                24                               272 
##        Manhole Fire - Blown Cover              Manhole Fire - Other 
##                                 9                                55 
##      Manhole Fire - Seeping Smoke         Other Transportation Fire 
##                               104                                14 
##    Transit System - NonStructural 
##                                59 
## 
## Structural Fires 
## 
##                                    Church Fire 
##                                             10 
##                                   Factory Fire 
##                                              1 
##                                  Hospital Fire 
##                                             18 
##         Multiple Dwelling 'A' - Compactor fire 
##                                              4 
## Multiple Dwelling 'A' - Food on the stove fire 
##                                            519 
##             Multiple Dwelling 'A' - Other fire 
##                                            168 
##                     Multiple Dwelling 'B' Fire 
##                                             85 
##                 Other Commercial Building Fire 
##                                            184 
##                     Other Public Building Fire 
##                                              4 
##                          Private Dwelling Fire 
##                                            412 
##                                    School Fire 
##                                             31 
##                                     Store Fire 
##                                              9 
##                    Transit System - Structural 
##                                              1 
##                Under Contruction / Vacant Fire 
##                                              1

3.1 NA Patterns?

At this point is essential to deal with NA values, trying to find the presence of possible relation with predictors. First things first let’s recap the number of NA values for each columns

colSums(is.na(fire_data))
##                     id              al_number            al_location 
##                      0                      0                      0 
##            inc_borough                zipcode               pol_prec 
##                      0                   2197                   2197 
##          city_con_dist             commu_dist          commu_sc_dist 
##                   2197                   2197                   2198 
##              cong_dist         al_source_desc          al_index_desc 
##                   2197                      0                      0 
##       highest_al_level              inc_class        inc_class_group 
##                      0                      0                      0 
##           disp_resp_qy     first_ass_datetime     first_act_datetime 
##                      0                      0                     41 
## first_onscene_datetime     inc_close_datetime      inc_resp_sec_indc 
##                      0                      0                      0 
##        inc_resp_min_qy       engines_assigned       ladders_assigned 
##                      0                      4                      4 
##  others_units_assigned             day_number               day_type 
##                      4                      0                      0 
##            ticket_time            time_of_day    total_assigned_unit 
##                      0                      0                      4

3.1.1 Checking the location predictors

Here we will check if there is a pattern on the absence of values in the following predictors: zipcode, pol_prec, city_con_dist, commu_dist, commu_sc_dist and cong_dist.

na_locations <- fire_data %>%
  filter(is.na(zipcode) | is.na(pol_prec) | is.na(city_con_dist) | is.na(commu_dist) | is.na(commu_sc_dist) | is.na(cong_dist))
ggplot(data=na_locations %>% 
        group_by(inc_class_group, inc_borough) %>%
        summarise(incident_number = n()), 
       aes(x=inc_borough, y=incident_number, fill=inc_class_group)) + geom_bar(stat="identity", position=position_dodge()) +
        geom_text(aes(label=incident_number), vjust=1.6, color="black",
                  position = position_dodge(0.9), size=3.5) +
        #scale_fill_brewer(palette="Paired") +
        labs(title = "NA location", x = "Borough", y = "Incident Count", fill = "Incident Class Group") +
        theme_grey()
## `summarise()` has grouped output by 'inc_class_group'. You can override using
## the `.groups` argument.

table(na_locations$inc_borough) / table(fire_data$inc_borough)
## 
##         Bronx      Brooklyn     Manhattan        Queens Staten Island 
##    0.07104538    0.04694547    0.07662157    0.07700328    0.07523697
table(na_locations$inc_class_group) / table(fire_data$inc_class_group)
## 
##    Medical Emergencies           Medical MFAs NonMedical Emergencies 
##             0.03988726             0.10975610             0.05691243 
##        NonMedical MFAs    NonStructural Fires       Structural Fires 
##             0.40447958             0.14906832             0.00552868

So around the 40% of the whole incident that are of the incident class group NonMedical MFAs have at least one of the location columns to NA. Let’s investigate.

fd_nm_mfa_cl <- table(subset(fire_data, inc_class_group == "NonMedical MFAs")$inc_class)
fd_nm_mfa_bro <- table(subset(fire_data, inc_class_group == "NonMedical MFAs")$inc_borough)

fd_nm_mfa_cl <- fd_nm_mfa_cl[fd_nm_mfa_cl != 0]
fd_nm_mfa_cl
## 
##                Non-Medical MFA - ERS     Non-Medical MFA - ERS No Contact 
##                                  586                                    1 
##              Non-Medical MFA - Phone Non-Medical MFA - Private Fire Alarm 
##                                  701                                  223 
##             Non-Medical MFA - Verbal 
##                                    7
na_nm_mfa_cl <- table(subset(na_locations, inc_class_group == "NonMedical MFAs")$inc_class)
na_nm_mfa_bro <- table(subset(na_locations, inc_class_group == "NonMedical MFAs")$inc_borough)

na_nm_mfa_cl <- na_nm_mfa_cl[names(fd_nm_mfa_cl)]
na_nm_mfa_cl
## 
##                Non-Medical MFA - ERS     Non-Medical MFA - ERS No Contact 
##                                  573                                    1 
##              Non-Medical MFA - Phone Non-Medical MFA - Private Fire Alarm 
##                                   38                                    2 
##             Non-Medical MFA - Verbal 
##                                    0
na_nm_mfa_cl / fd_nm_mfa_cl
## 
##                Non-Medical MFA - ERS     Non-Medical MFA - ERS No Contact 
##                           0.97781570                           1.00000000 
##              Non-Medical MFA - Phone Non-Medical MFA - Private Fire Alarm 
##                           0.05420827                           0.00896861 
##             Non-Medical MFA - Verbal 
##                           0.00000000

So the 97% of all the Non-Medical MFA - ERS observations in the entire dataset have one of the location attribute equal to NA

na_nm_mfa_bro / fd_nm_mfa_bro
## 
##         Bronx      Brooklyn     Manhattan        Queens Staten Island 
##     0.4676056     0.3075221     0.3894472     0.3733333     0.7954545

And from here we can see that about the 78% of the observations that are NonMedical - MFAs that have at least one district column attribute to NA are from the RICHMOND / STATEN ISLAND. Also BRONX has about half of the NonMedical - MFAs observations that have at least one district column to NA.

3.1.2 Checking the assigned units predictors

print(fire_data %>%
  filter(is.na(engines_assigned) | is.na(ladders_assigned) | is.na(others_units_assigned)) %>%
  group_by(inc_borough, inc_class)) %>%
  summarise(incident_count = n())
## # A tibble: 4 Ă— 30
## # Groups:   inc_borough, inc_class [4]
##   id            al_number al_location inc_borough zipcode pol_prec city_con_dist
##   <chr>             <int> <chr>       <fct>         <int>    <int>         <int>
## 1 230905-Q4545…      4545 53 AVE & 6… Queens        11378      104            30
## 2 230914-Q1014…      1014 CENTRAL AV… Queens        11691      101            31
## 3 230918-Q9643…      9643 JAMAICA AV… Queens        11418      102            29
## 4 230919-M0684…       684 1 AVE & E … Manhattan     10016       13             4
## # ℹ 23 more variables: commu_dist <int>, commu_sc_dist <int>, cong_dist <int>,
## #   al_source_desc <fct>, al_index_desc <fct>, highest_al_level <fct>,
## #   inc_class <fct>, inc_class_group <fct>, disp_resp_qy <int>,
## #   first_ass_datetime <dttm>, first_act_datetime <dttm>,
## #   first_onscene_datetime <dttm>, inc_close_datetime <dttm>,
## #   inc_resp_sec_indc <fct>, inc_resp_min_qy <dbl>, engines_assigned <int>,
## #   ladders_assigned <int>, others_units_assigned <int>, day_number <fct>, …
## `summarise()` has grouped output by 'inc_borough'. You can override using the
## `.groups` argument.
## # A tibble: 4 Ă— 3
## # Groups:   inc_borough [2]
##   inc_borough inc_class                              incident_count
##   <fct>       <fct>                                           <int>
## 1 Manhattan   Vehicle Accident - Other                            1
## 2 Queens      Assist Civilian - Non-Medical                       1
## 3 Queens      Medical - No PT Contact EMS is Onscene              1
## 4 Queens      Medical - PD Link 10-91                             1

3.1.3 Checking the first_act_datetime predictors

na_first_act_datetime <- fire_data %>% filter(is.na(first_act_datetime))
print(na_first_act_datetime %>% group_by(inc_class, inc_borough) %>% summarise(incident_count = n()))
## `summarise()` has grouped output by 'inc_class'. You can override using the
## `.groups` argument.
## # A tibble: 27 Ă— 3
## # Groups:   inc_class [15]
##    inc_class                         inc_borough   incident_count
##    <fct>                             <fct>                  <int>
##  1 Alarm System - Unnecessary        Brooklyn                   2
##  2 Assist Civilian - Non-Medical     Bronx                      1
##  3 Assist Civilian - Non-Medical     Brooklyn                   4
##  4 Assist Civilian - Non-Medical     Queens                     1
##  5 Demolition Debris or Rubbish Fire Brooklyn                   1
##  6 Downed Tree                       Manhattan                  1
##  7 Downed Tree                       Queens                     1
##  8 Downed Tree                       Staten Island              1
##  9 Elevator Emergency - Occupied     Brooklyn                   1
## 10 Elevator Emergency - Occupied     Manhattan                  1
## # ℹ 17 more rows
ggplot(data=na_first_act_datetime %>% 
        group_by(inc_class_group, inc_borough) %>%
        summarise(incident_number = n()), 
       aes(x=inc_borough, y=incident_number, fill=inc_class_group)) + geom_bar(stat="identity", position=position_dodge()) +
labs(title = "NA First Act Date", x = "Borough", y = "Incident Count", fill = "Incident Class Group") +
  theme_minimal()
## `summarise()` has grouped output by 'inc_class_group'. You can override using
## the `.groups` argument.

Seems to be random and thus there is no pattern that motivate the presence od na values in first_act_datetime.

At this point we can omit the na values.

fire_data <- na.omit(fire_data)

3.2 Additional Data Visaulization

In this section we plot additional data visualization focus on the geographical visualization of the New York borough with relative predictors. In order to do so we load two additional datasets: 1. Alarm_Box_Locations.csv is a dataset that includes geographical informations about the alarm box, including latitude and longitude useful to plot points in a map. 2. fdny-firehouse-listing.csv is a dataset that includes the geographical informations of every firefighter stations in the NYC, including again latitude and longitude.

alarm_box_loc <- read.csv("datasets/Alarm_Box_Locations.csv")

head(alarm_box_loc)
##   BOROBOX BOX_TYPE                    LOCATION   ZIP       BOROUGH
## 1   B2653      ERS               3 AVE & 65 ST 11220      Brooklyn
## 2   Q7917     BARS        WOODSIDE AVE & 69 ST 11377        Queens
## 3   B0801      ERS    MYRTLE AVE & PALMETTO ST 11237      Brooklyn
## 4   B1046      ERS NEW YORK AVE & LEFFERTS AVE 11225      Brooklyn
## 5   B0109      ERS         RIVER & NORTH 3 STS 11211      Brooklyn
## 6   R2465     BARS    VINCENT AVE & COVERLY ST 10306 Staten Island
##   COMMUNITYDISTICT CITYCOUNCIL LATITUDE LONGITUDE
## 1             BK07          38 40.63932 -74.02355
## 2             QN02          26 40.74269 -73.89565
## 3             QN05          34 40.69953 -73.91103
## 4             BK09          40 40.66253 -73.94791
## 5             BK01          33 40.71838 -73.96462
## 6             SI03          50 40.57084 -74.12511
##                     Location.Point
## 1 POINT (-74.02354939 40.63932033)
## 2  POINT (-73.89565167 40.7426855)
## 3  POINT (-73.9110349 40.69953211)
## 4 POINT (-73.94791393 40.66253364)
## 5 POINT (-73.96462115 40.71837562)
## 6 POINT (-74.12510919 40.57084247)
summary(alarm_box_loc)
##    BOROBOX            BOX_TYPE           LOCATION              ZIP       
##  Length:13008       Length:13008       Length:13008       Min.   :   83  
##  Class :character   Class :character   Class :character   1st Qu.:10314  
##  Mode  :character   Mode  :character   Mode  :character   Median :11211  
##                                                           Mean   :10864  
##                                                           3rd Qu.:11369  
##                                                           Max.   :11697  
##                                                           NA's   :27     
##    BOROUGH          COMMUNITYDISTICT    CITYCOUNCIL       LATITUDE    
##  Length:13008       Length:13008       Min.   : 1.00   Min.   :40.50  
##  Class :character   Class :character   1st Qu.:19.00   1st Qu.:40.64  
##  Mode  :character   Mode  :character   Median :28.00   Median :40.71  
##                                        Mean   :28.89   Mean   :40.71  
##                                        3rd Qu.:41.00   3rd Qu.:40.76  
##                                        Max.   :51.00   Max.   :40.91  
##                                        NA's   :4                      
##    LONGITUDE      Location.Point    
##  Min.   :-74.25   Length:13008      
##  1st Qu.:-73.98   Class :character  
##  Median :-73.92   Mode  :character  
##  Mean   :-73.92                     
##  3rd Qu.:-73.84                     
##  Max.   :-73.70                     
## 
firefighter_stations <- read.csv("datasets/fdny-firehouse-listing.csv")

head(firefighter_stations)
##                                              FacilityName       FacilityAddress
## 1                                      Engine 4/Ladder 15       42 South Street
## 2                                     Engine 10/Ladder 10    124 Liberty Street
## 3                                                Engine 6     49 Beekman Street
## 4 Engine 7/Ladder 1/Battalion 1/Manhattan Borough Command  100-104 Duane Street
## 5                                                Ladder 8 14 North Moore Street
## 6                                       Engine 9/Ladder 6       75 Canal Street
##     Borough Postcode Latitude Longitude Community.Board Community.Council
## 1 Manhattan    10005 40.70347 -74.00754               1                 1
## 2 Manhattan    10006 40.71007 -74.01252               1                 1
## 3 Manhattan    10038 40.71005 -74.00525               1                 1
## 4 Manhattan    10007 40.71546 -74.00594               1                 1
## 5 Manhattan    10013 40.71976 -74.00668               1                 1
## 6 Manhattan    10002 40.71521 -73.99290               3                 1
##   Census.Tract     BIN        BBL
## 1            7 1000867 1000350001
## 2           13 1075700 1000520022
## 3         1501 1001287 1000930030
## 4           33 1001647 1001500025
## 5           33 1002150 1001890035
## 6           16 1003898 1003000030
##                                                                           NTA
## 1 Battery Park City-Lower Manhattan                                          
## 2 Battery Park City-Lower Manhattan                                          
## 3 Battery Park City-Lower Manhattan                                          
## 4 SoHo-TriBeCa-Civic Center-Little Italy                                     
## 5 SoHo-TriBeCa-Civic Center-Little Italy                                     
## 6 Chinatown
summary(firefighter_stations)
##  FacilityName       FacilityAddress      Borough             Postcode    
##  Length:218         Length:218         Length:218         Min.   :10001  
##  Class :character   Class :character   Class :character   1st Qu.:10304  
##  Mode  :character   Mode  :character   Mode  :character   Median :11103  
##                                                           Mean   :10784  
##                                                           3rd Qu.:11231  
##                                                           Max.   :11695  
##                                                           NA's   :5      
##     Latitude       Longitude      Community.Board  Community.Council
##  Min.   :40.51   Min.   :-74.24   Min.   : 1.000   Min.   : 1.00    
##  1st Qu.:40.66   1st Qu.:-73.99   1st Qu.: 3.000   1st Qu.:12.00    
##  Median :40.72   Median :-73.94   Median : 6.000   Median :27.00    
##  Mean   :40.72   Mean   :-73.94   Mean   : 7.075   Mean   :25.63    
##  3rd Qu.:40.77   3rd Qu.:-73.89   3rd Qu.:11.000   3rd Qu.:38.00    
##  Max.   :40.89   Max.   :-73.72   Max.   :84.000   Max.   :51.00    
##  NA's   :5       NA's   :5        NA's   :5        NA's   :5        
##   Census.Tract         BIN               BBL                NTA           
##  Min.   :     1   Min.   :1000867   Min.   :1.000e+09   Length:218        
##  1st Qu.:   129   1st Qu.:2003268   1st Qu.:2.025e+09   Class :character  
##  Median :   275   Median :3064786   Median :3.025e+09   Mode  :character  
##  Mean   :  5950   Mean   :2900421   Mean   :2.850e+09                     
##  3rd Qu.:   800   3rd Qu.:4090228   3rd Qu.:4.033e+09                     
##  Max.   :157902   Max.   :5154879   Max.   :5.080e+09                     
##  NA's   :5        NA's   :5         NA's   :5

We now start with the firefighter stations dataset.

# make a copy of the fire_data
fire_data_for_ffs <- fire_data

fire_data_for_ffs <- fire_data_for_ffs %>% rename(borough = inc_borough)

firefighter_stations$Borough <- as.factor(firefighter_stations$Borough)
firefighter_stations <- firefighter_stations %>% rename(borough = Borough)

# remove the na values
firefighter_stations <- na.omit(firefighter_stations)
stations_borough <- firefighter_stations %>%
                    group_by(borough) %>%
                    summarise(number_of_stations = n())
count_inc_brough <- fire_data_for_ffs %>% group_by(borough) %>% summarise(incident_count = n())

stations_borough$incident_per_station <- round(count_inc_brough$incident_count / stations_borough$number_of_stations, digits = 3)

count_inc_brough <- merge(count_inc_brough, stations_borough, by="borough")
firefighter_stations <- firefighter_stations %>% rename(lat = Latitude, lon = Longitude)
firefighter_stations_sdf <- st_as_sf(firefighter_stations, coords = c("lon", "lat"), crs = 4326)
head(firefighter_stations_sdf)
## Simple feature collection with 6 features and 10 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -74.01252 ymin: 40.70347 xmax: -73.9929 ymax: 40.71976
## Geodetic CRS:  WGS 84
##                                              FacilityName       FacilityAddress
## 1                                      Engine 4/Ladder 15       42 South Street
## 2                                     Engine 10/Ladder 10    124 Liberty Street
## 3                                                Engine 6     49 Beekman Street
## 4 Engine 7/Ladder 1/Battalion 1/Manhattan Borough Command  100-104 Duane Street
## 5                                                Ladder 8 14 North Moore Street
## 6                                       Engine 9/Ladder 6       75 Canal Street
##     borough Postcode Community.Board Community.Council Census.Tract     BIN
## 1 Manhattan    10005               1                 1            7 1000867
## 2 Manhattan    10006               1                 1           13 1075700
## 3 Manhattan    10038               1                 1         1501 1001287
## 4 Manhattan    10007               1                 1           33 1001647
## 5 Manhattan    10013               1                 1           33 1002150
## 6 Manhattan    10002               3                 1           16 1003898
##          BBL
## 1 1000350001
## 2 1000520022
## 3 1000930030
## 4 1001500025
## 5 1001890035
## 6 1003000030
##                                                                           NTA
## 1 Battery Park City-Lower Manhattan                                          
## 2 Battery Park City-Lower Manhattan                                          
## 3 Battery Park City-Lower Manhattan                                          
## 4 SoHo-TriBeCa-Civic Center-Little Italy                                     
## 5 SoHo-TriBeCa-Civic Center-Little Italy                                     
## 6 Chinatown                                                                  
##                     geometry
## 1 POINT (-74.00754 40.70347)
## 2 POINT (-74.01252 40.71007)
## 3 POINT (-74.00525 40.71005)
## 4 POINT (-74.00594 40.71546)
## 5 POINT (-74.00668 40.71976)
## 6  POINT (-73.9929 40.71521)

Downloand of the GEO JSON

geojson_newyork <- geojson_read("datasets/NYC_BoroughBoundaries.geojson",  what = "sp")
geojson_newyork <- setNames(geojson_newyork, c("borough_code", "borough", "shape_area", "shape_leng"))
geojson_newyork$borough <- as.factor(geojson_newyork$borough)
geojson_newyork$borough_code <- NULL
head(geojson_newyork)
## class       : SpatialPolygonsDataFrame 
## features    : 5 
## extent      : -74.25559, -73.70001, 40.49613, 40.91553  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs 
## variables   : 3
## names       :       borough,    shape_area,    shape_leng 
## min values  :         Bronx,  1187174772.5,  325917.35395 
## max values  : Staten Island, 636520502.758, 888199.731385
geojson_newyork@data = data.frame(geojson_newyork@data, count_inc_brough[match(geojson_newyork@data$borough, count_inc_brough$borough),])
geojson_newyork@data$borough.1 <- NULL
mapview(list(firefighter_stations_sdf, geojson_newyork),
        zcol = list(NULL, "incident_count"),
        legend = list(FALSE, TRUE),
        homebutton = list(FALSE, TRUE), layer.name = list(NULL, "indicents_number"), alpha.regions = 0.5, aplha = 1)

distribution of incident per incident class

ggplot(data=fire_data %>% group_by(inc_class_group, inc_borough) %>%  summarise(incident_count = n()), 
       aes(x=inc_borough, y=incident_count, fill=inc_class_group)) +
    geom_bar(stat="identity", position=position_dodge()) +
    geom_text(aes(label=incident_count), vjust=1.6, color="black",
              position = position_dodge(0.9), size=2.5) +
    #scale_fill_brewer(palette="Paired") +
    labs(title = "Borough - Incident Count - Incident Class Group", x = "Borough", y = "Incident Count", fill = "Incident Class Group") +
  theme_minimal()
## `summarise()` has grouped output by 'inc_class_group'. You can override using
## the `.groups` argument.

ggplot(fire_data, aes(x = inc_borough, y = inc_resp_min_qy, color = inc_class_group)) +  # ggplot function
  geom_boxplot() + scale_color_brewer(palette="Dark2") +
  labs(title = "Borough - Incident Class Group - Valid Response Time in Minutes", x = "Borough", y = "Valid Response Time in Minutes", color = "Incident Class Group") +
  theme_grey()

fire_data[which.max(fire_data$inc_resp_min_qy), ]
##                          id al_number               al_location inc_borough
## 16800 230918-M0856-001-0845       856 6 AVE & W 50 ST/ROCK CNTR   Manhattan
##       zipcode pol_prec city_con_dist commu_dist commu_sc_dist cong_dist
## 16800   10111       18             4        105             2        12
##       al_source_desc al_index_desc highest_al_level
## 16800            EMS Initial Alarm      First Alarm
##                                    inc_class     inc_class_group disp_resp_qy
## 16800 Medical - No PT Contact EMS is Onscene Medical Emergencies           18
##        first_ass_datetime  first_act_datetime first_onscene_datetime
## 16800 2023-09-18 04:43:57 2023-09-18 04:46:54    2023-09-18 05:42:34
##        inc_close_datetime inc_resp_sec_indc inc_resp_min_qy engines_assigned
## 16800 2023-09-18 17:53:00                 Y        58.91667                1
##       ladders_assigned others_units_assigned day_number day_type ticket_time
## 16800                0                     0         18  Weekday  69.35 mins
##       time_of_day total_assigned_unit
## 16800   Afternoon                   1
ggplot(fire_data, aes(x = inc_borough, y = engines_assigned, color = inc_class_group)) +  # ggplot function
  geom_boxplot() +
  labs(title = "Borough - Engine Assigned - Incident Class Group", x = "Borough", y = "Engine Assigned", color = "Incident Class Group")

fire_data %>% arrange(desc(engines_assigned)) %>% slice(1:2)
##                      id al_number                    al_location inc_borough
## 1 230916-B3375-005-0342      3375              AVENUE W & E 3 ST    Brooklyn
## 2 230924-B0994-001-0003       994 MARCY AVE AVE & MAC DONOUGH ST    Brooklyn
##   zipcode pol_prec city_con_dist commu_dist commu_sc_dist cong_dist
## 1   11223       61            44        315            21         8
## 2   11216       79            36        303            13         8
##   al_source_desc al_index_desc highest_al_level
## 1          PHONE        Others    2nd-3rd Alarm
## 2          PHONE        Others    2nd-3rd Alarm
##                            inc_class  inc_class_group disp_resp_qy
## 1 Multiple Dwelling 'A' - Other fire Structural Fires           34
## 2 Multiple Dwelling 'A' - Other fire Structural Fires           23
##    first_ass_datetime  first_act_datetime first_onscene_datetime
## 1 2023-09-16 11:30:17 2023-09-16 11:30:30    2023-09-16 11:32:09
## 2 2023-09-24 12:05:45 2023-09-24 12:05:51    2023-09-24 12:08:16
##    inc_close_datetime inc_resp_sec_indc inc_resp_min_qy engines_assigned
## 1 2023-09-16 18:59:35                 Y        2.433333               19
## 2 2023-09-24 09:39:22                 Y        2.900000               18
##   ladders_assigned others_units_assigned day_number day_type   ticket_time
## 1               15                    32         16  Weekend 449.8667 mins
## 2               13                    25         24  Weekend 574.0167 mins
##   time_of_day total_assigned_unit
## 1     Morning                  66
## 2       Night                  56

09/16/2023 06:59:35 PM, structural fires, source incident phone, url of the incident https://www.cbsnews.com/newyork/news/gravesend-brooklyn-fire-injuries/ 09/24/2023 12:05:21 AM, structural fires, source incident phone, url of the incident https://abc7ny.com/three-injured-in-fire-bedford-stuyvesant-beford-stuyvesant-brooklyn-fdny/13823430/

ggplot(fire_data, aes(x = inc_borough, y = ladders_assigned, color = inc_class_group)) +  # ggplot function
  geom_boxplot() +
  labs(title = "Borough - Ladders Assigned - Incident Class Group", x = "Borough", y = "Engine Assigned", color = "Incident Class Group")

fire_data %>% arrange(desc(ladders_assigned)) %>% slice(1:2)
##                      id al_number                    al_location inc_borough
## 1 230916-B3375-005-0342      3375              AVENUE W & E 3 ST    Brooklyn
## 2 230924-B0994-001-0003       994 MARCY AVE AVE & MAC DONOUGH ST    Brooklyn
##   zipcode pol_prec city_con_dist commu_dist commu_sc_dist cong_dist
## 1   11223       61            44        315            21         8
## 2   11216       79            36        303            13         8
##   al_source_desc al_index_desc highest_al_level
## 1          PHONE        Others    2nd-3rd Alarm
## 2          PHONE        Others    2nd-3rd Alarm
##                            inc_class  inc_class_group disp_resp_qy
## 1 Multiple Dwelling 'A' - Other fire Structural Fires           34
## 2 Multiple Dwelling 'A' - Other fire Structural Fires           23
##    first_ass_datetime  first_act_datetime first_onscene_datetime
## 1 2023-09-16 11:30:17 2023-09-16 11:30:30    2023-09-16 11:32:09
## 2 2023-09-24 12:05:45 2023-09-24 12:05:51    2023-09-24 12:08:16
##    inc_close_datetime inc_resp_sec_indc inc_resp_min_qy engines_assigned
## 1 2023-09-16 18:59:35                 Y        2.433333               19
## 2 2023-09-24 09:39:22                 Y        2.900000               18
##   ladders_assigned others_units_assigned day_number day_type   ticket_time
## 1               15                    32         16  Weekend 449.8667 mins
## 2               13                    25         24  Weekend 574.0167 mins
##   time_of_day total_assigned_unit
## 1     Morning                  66
## 2       Night                  56

Same incidents of before

ggplot(fire_data, aes(x = inc_borough, y = others_units_assigned, color = inc_class_group)) +  # ggplot function
  geom_boxplot() +
  labs(title = "Borough - Other Units Assigned - Incident Class Group", x = "Borough", y = "Other Units Assigned", color = "Incident Class Group")

fire_data %>% arrange(desc(others_units_assigned)) %>% slice(1:2)
##                      id al_number                    al_location inc_borough
## 1 230916-B3375-005-0342      3375              AVENUE W & E 3 ST    Brooklyn
## 2 230924-B0994-001-0003       994 MARCY AVE AVE & MAC DONOUGH ST    Brooklyn
##   zipcode pol_prec city_con_dist commu_dist commu_sc_dist cong_dist
## 1   11223       61            44        315            21         8
## 2   11216       79            36        303            13         8
##   al_source_desc al_index_desc highest_al_level
## 1          PHONE        Others    2nd-3rd Alarm
## 2          PHONE        Others    2nd-3rd Alarm
##                            inc_class  inc_class_group disp_resp_qy
## 1 Multiple Dwelling 'A' - Other fire Structural Fires           34
## 2 Multiple Dwelling 'A' - Other fire Structural Fires           23
##    first_ass_datetime  first_act_datetime first_onscene_datetime
## 1 2023-09-16 11:30:17 2023-09-16 11:30:30    2023-09-16 11:32:09
## 2 2023-09-24 12:05:45 2023-09-24 12:05:51    2023-09-24 12:08:16
##    inc_close_datetime inc_resp_sec_indc inc_resp_min_qy engines_assigned
## 1 2023-09-16 18:59:35                 Y        2.433333               19
## 2 2023-09-24 09:39:22                 Y        2.900000               18
##   ladders_assigned others_units_assigned day_number day_type   ticket_time
## 1               15                    32         16  Weekend 449.8667 mins
## 2               13                    25         24  Weekend 574.0167 mins
##   time_of_day total_assigned_unit
## 1     Morning                  66
## 2       Night                  56

Same incidents of before

ggplot(fire_data, aes(x = inc_borough, y = total_assigned_unit, color = inc_class_group)) +  # ggplot function
  geom_boxplot() + 
  labs(title = "Borough - Total Units Assigned - Incident Class Group", x = "Borough", y = "Total Units Assigned", color = "Incident Class Group")

fire_data %>% arrange(desc(total_assigned_unit)) %>% slice(1:2)
##                      id al_number                    al_location inc_borough
## 1 230916-B3375-005-0342      3375              AVENUE W & E 3 ST    Brooklyn
## 2 230924-B0994-001-0003       994 MARCY AVE AVE & MAC DONOUGH ST    Brooklyn
##   zipcode pol_prec city_con_dist commu_dist commu_sc_dist cong_dist
## 1   11223       61            44        315            21         8
## 2   11216       79            36        303            13         8
##   al_source_desc al_index_desc highest_al_level
## 1          PHONE        Others    2nd-3rd Alarm
## 2          PHONE        Others    2nd-3rd Alarm
##                            inc_class  inc_class_group disp_resp_qy
## 1 Multiple Dwelling 'A' - Other fire Structural Fires           34
## 2 Multiple Dwelling 'A' - Other fire Structural Fires           23
##    first_ass_datetime  first_act_datetime first_onscene_datetime
## 1 2023-09-16 11:30:17 2023-09-16 11:30:30    2023-09-16 11:32:09
## 2 2023-09-24 12:05:45 2023-09-24 12:05:51    2023-09-24 12:08:16
##    inc_close_datetime inc_resp_sec_indc inc_resp_min_qy engines_assigned
## 1 2023-09-16 18:59:35                 Y        2.433333               19
## 2 2023-09-24 09:39:22                 Y        2.900000               18
##   ladders_assigned others_units_assigned day_number day_type   ticket_time
## 1               15                    32         16  Weekend 449.8667 mins
## 2               13                    25         24  Weekend 574.0167 mins
##   time_of_day total_assigned_unit
## 1     Morning                  66
## 2       Night                  56

Same incidents of before

ggplot(fire_data %>% group_by(al_source_desc, inc_class_group, inc_borough) %>% summarise(incidents_number = n()),
       aes(x = inc_borough, y = incidents_number, color = al_source_desc)) +  # ggplot function
  geom_boxplot() +
  labs(title = "Borough - Incident Number - Alarm Source", x = "Borough", y = "Incident Number", color = "Alarm Source")
## `summarise()` has grouped output by 'al_source_desc', 'inc_class_group'. You
## can override using the `.groups` argument.

ggplot(fire_data %>% group_by(inc_class_group, inc_borough, day_type) %>% summarise(incidents_number = n()),
       aes(x = inc_borough, y = incidents_number, color = day_type)) +
  geom_boxplot()  +
  labs(title = "Borouhg - Incident Count - Day Type", x = "Borough", y = "Incident Count", color = "Day Type")
## `summarise()` has grouped output by 'inc_class_group', 'inc_borough'. You can
## override using the `.groups` argument.

ggplot(fire_data %>% group_by(inc_class_group, inc_borough, time_of_day) %>% summarise(incidents_number = n()),
       aes(x = inc_borough, y = incidents_number, color = time_of_day)) +
  geom_boxplot() +
  labs(title = "Borough - Incident Count - Day Time", x = "Borough", y = "Incident Count", fill = "Day Time")
## `summarise()` has grouped output by 'inc_class_group', 'inc_borough'. You can
## override using the `.groups` argument.

4 Let’s build some models

But before creating any model have to split the cleaned dataset into train and test, with 0.8% of the whole dataset for the train set and the remaining 20% for the test set.

sample <- sample(c(TRUE, FALSE), nrow(fire_data), replace=TRUE, prob=c(0.8,0.2))
fire_data.train <- fire_data[sample, ]
fire_data.test <- fire_data[!sample, ]
dim(fire_data.train)
## [1] 24594    30
dim(fire_data.test)
## [1] 6134   30